首页> 外文OA文献 >Putting Bandits Into Context: How Function Learning Supports Decision Making
【2h】

Putting Bandits Into Context: How Function Learning Supports Decision Making

机译:将强盗置于语境中:功能学习如何支持决策

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The authors introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximize their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, the authors explore participants’ behavior in such learning environments. They predict participants’ behavior by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behavior is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a probability of improvement decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favorite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is nonlinear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies.
机译:作者介绍了上下文多武装匪徒任务,作为研究不确定环境中学习和决策的框架。在这种新颖的范式中,参与者反复在多个选项之间进行选择,以最大化他们的回报。这些选项由许多上下文功能描述,这些功能通过最初未知的功能来预测奖励。通过他们在选择方案和观察决策后果方面的经验,参与者可以了解情境和奖励之间的功能关系,并随着时间的推移改进其决策策略。在三个实验中,作者探索了参与者在这种学习环境中的行为。他们通过上下文盲(均值跟踪,卡尔曼滤波器)和上下文(高斯过程和线性回归)学习方法以及不同的选择策略来预测参与者的行为。参与者大多能够了解上下文奖励功能,而他们的行为最好由高斯过程学习策略来描述,该策略将以前的经验推广到类似实例。在具有二进制功能的相对简单的任务中,他们似乎将这种学习与改进概率决策策略相结合,该策略侧重于预期会导致当前最喜欢的选项有所改进的替代方案。在一项具有与奖励线性相关的连续特征的任务中,参与者似乎更明确地平衡了探索和开发。最后,在困难的学习环境中,特征和奖励之间的关系是非线性的,一些参与者再次被高斯过程学习策略很好地描述,而其他参与者则恢复为上下文盲策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号